Subtopic Structuring fbr l?ull-Length Document Access

نویسندگان

Marti A. Hearst

Christian Plaunt

چکیده

We argue that the advent of large volumes (of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Towamd this end, we discuss the merits of imposing structure on fulllength text documents; that is, a partition of t’he text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

University of Glasgow at the NTCIR-9 Intent task: Experiments with Terrier on Subtopic Mining and Document Ranking

We describe our participation in the subtopic mining and document ranking subtasks of the NTCIR-9 Intent task, for both Chinese and Japanese languages. In the subtopic mining subtask, we experiment with a novel data-driven approach for ranking reformulations of an ambiguous query. In the document ranking subtask, we deploy our state-ofthe-art xQuAD framework for search result diversification.

متن کامل

Full discrimination of subtopics in search results with keyphrase-based clustering

We consider the problem of retrieving multiple documents relevant to the single subtopics of a given web query, termed “full-subtopic retrieval”. To solve this problem we present a novel search results clustering algorithm that generates clusters labeled by keyphrases. The keyphrases are extracted from the generalized suffix tree built from the search results and merged through an improved hier...

متن کامل

THUSAM at NTCIR-11 IMine Task

This paper describes our approaches and results in NTCIR11 IMine task. In 2014, we participate in subtasks for Chinese/English Subtopic Mining and Chinese Document Ranking. In Subtopic Mining subtask, we mine subtopic candidates from various resources and construct the subtopic hierarchy with several different strategies. In Document Ranking subtask, we rerank the result lists with HITS algorit...

متن کامل

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

Users express their information needs in terms of queries to find the relevant documents on the web. However, users’ queries are usually short, so that search engines may not have enough information to determine their exact intents. How to diversify web search results to cover users’ possible intents as wide as possible is an important research issue. In this paper, we will propose several subt...

متن کامل

HITSZ-ICRC at NTCIR-12 Temporal Information Access Task

This paper presents the methods HITSZ-ICRC group used to Temporalia-2 task at NTCIR-12, including subtask Temporal Intent Disambiguation (TID) and subtask Temporal Diversified Retrieval (TDR). In the TID subtask, we merged results of rule based method and word temporal intent classes vector based method to estimate temporal intent classes distribution on English queries and Chinese queries. The...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1993

Subtopic Structuring fbr l?ull-Length Document Access

نویسندگان

چکیده

منابع مشابه

University of Glasgow at the NTCIR-9 Intent task: Experiments with Terrier on Subtopic Mining and Document Ranking

Full discrimination of subtopics in search results with keyphrase-based clustering

THUSAM at NTCIR-11 IMine Task

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

HITSZ-ICRC at NTCIR-12 Temporal Information Access Task

عنوان ژورنال:

اشتراک گذاری